On SGD's Failure in Practice: Characterizing and Overcoming Stalling
نویسنده
چکیده
Abstract Stochastic Gradient Descent (SGD) is widely used in machine learning problems to efficiently perform empirical risk minimization, yet, in practice, SGD is known to stall before reaching the actual minimizer of the empirical risk. SGD stalling has often been attributed to its sensitivity to the conditioning of the problem; however, as we demonstrate, SGD will stall even when applied to a simple linear regression problem with unity condition number for standard learning rates. Thus, in this work, we numerically demonstrate and mathematically argue that stalling is a crippling and generic limitation of SGD and its variants in practice. Once we have established the problem of stalling, we introduce a framework for hedging against its effects, which (1) deters SGD and its variants from stalling, (2) still provides convergence guarantees, and (3) makes SGD and its variants more practical methods for minimization.
منابع مشابه
Effect of education on practice and knowledge rate of hospitalized heart failure patients regarding their self-care behaviors and methods
Heart failure is one of the chronic conditions of heart in the elderly people. The current study aimed to determine the effects of education on knowledge and practice of hospitalized heart failure patients regarding their self-care behaviors. This was a quasi-experimental study with control group which was conducted on 140 patients hospitalizing in Zahedan. The primary information was gathered ...
متن کاملOvercoming the uncertainty in a research reactor LOCA in level-1 PSA; Fuzzy based fault-tree/event-tree analysis
Probabilistic safety assessment (PSA) which plays a crucial role in risk evaluation is a quantitative approach intended to demonstrate how a nuclear reactor meets the safety margins as part of the licensing process. Despite PSA merits, some shortcomings associated with the final results exist. Conventional PSA uses crisp values to represent the failure probabilities of basic events. This causes...
متن کاملA new network simplex algorithm to reduce consecutive degenerate pivots and prevent stalling
It is well known that in operations research, degeneracy can cause a cycle in a network simplex algorithm which can be prevented by maintaining strong feasible bases in each pivot. Also, in a network consists of n arcs and m nodes, not considering any new conditions on the entering variable, the upper bound of consecutive degenerate pivots is equal $left( begin{array}{c} n...
متن کاملstudying the Problems Concerning the Internship (Teaching Practice) Course from the Farhangian University Students' Point of View
Abstract Internship is the apex of teacher training which converts learning situations into pportunities for the student-teacher's success.The present paper aims at identifying the problems concerning the internship (teaching practice) course in Farhangian University and finding appropriate solutions to overcome them in order to improve the quality of the internship course to prepare student...
متن کاملComparison of two methods of education (lecture and self learning) on knowledge and practice of mothers with under 3 year old children about growth monitoring and nutritional development stages
Introduction: Assessment of national children growth has shown children‘s growth failure in a large percentage of them in Iran. Growth failure is easily diagnosed by growth monitoring card .On the other hand, mothers’ Knowledge of Nutritional development stages can help them to modify their practice in this field .In this case, conducting educational and interventional programs play a key role ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1702.00317 شماره
صفحات -
تاریخ انتشار 2017